NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Solving Attention Kernel Regression Problem via Pre-conditioner

Song, Zhao; Yin, Junze; Zhang, Lichen (May 2024, Proceedings of Machine Learning Research)

Full Text Available
Fast Dynamic Sampling for Determinantal Point Processes

Song, Zhao; Yin, Junze; Zhang, Lichen; Zhan, Ruizhe (May 2024, Proceedings of Machine Learning Research)

Full Text Available
On Convergence of Federated Averaging Langevin Dynamics

Deng, Wei; Zhang, Qian; Ma, Yian; Song, Zhao; Lin, Guang (April 2024, 40th Conference on Uncertainty in Artificial Intelligence)

Full Text Available
On Convergence of Federated Averaging Langevin Dynamics

Deng, Wei; Zhang, Qian; Ma, Yian; Song, Zhao; Lin, Guang (April 2024, 40th Conference on Uncertainty in Artificial Intelligence)

Full Text Available
On Convergence of Federated Averaging Langevin Dynamics

Deng, Wei; Zhang, Qian; Ma, Yian; Song, Zhao; Lin, Guang (April 2024, UAI Publisher)

We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize the communication cost. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe that there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
more » « less
Full Text Available
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

https://doi.org/10.4230/LIPIcs.ITCS.2024.93

Song, Zhao; Zhang, Lichen; Zhang, Ruizhe (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Guruswami, Venkatesan (Ed.)
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width m is much larger than the data dimension d and the number of training samples n (m = poly(n,d)), which induces a prohibitive large weight matrix W ∈ ℝ^{m× m} per layer. Naively, one has to pay O(m²) time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses m² cost only in the initialization phase and achieves a truly subquadratic cost per iteration in terms of m, i.e., m^{2-Ω(1)} per iteration. Our result has implications beyond standard over-parametrization theory, as it can be viewed as designing an efficient data structure on top of a pre-trained large model to further speed up the fine-tuning process, a core procedure to deploy large language models (LLM).
more » « less
Full Text Available
Convex Minimization with Integer Minima in Õ(n^4) Time

Jiang, Haotian; Lee, Yin Tat; Song, Zhao; Zhang, Lichen (January 2024, SIAM)

Full Text Available
A nearly-optimal bound for fast regression with ℓ∞ guarantee

Song, Zhao; Ye, Mingquan; Yin, Junze; Zhang, Lichen (July 2023, Proceedings of Machine Learning Research)

Given a matrix A ∈ ℝn\texttimes{}d and a vector b ∈ ℝn, we consider the regression problem with ℓ∞ guarantees: finding a vector x′ ∈ ℝd such that $$||x'-x^* ||_infty leq frac{epsilon}{sqrt{d}}cdot ||Ax^*-b||_2cdot ||A^dagger||$$, where x* = arg minx∈Rd ||Ax – b||2. One popular approach for solving such ℓ2 regression problem is via sketching: picking a structured random matrix S ∈ ℝm\texttimes{}n with m < n and S A can be quickly computed, solve the "sketched" regression problem arg minx∈ℝd ||S Ax – Sb||2. In this paper, we show that in order to obtain such ℓ∞ guarantee for ℓ2 regression, one has to use sketching matrices that are dense. To the best of our knowledge, this is the first user case in which dense sketching matrices are necessary. On the algorithmic side, we prove that there exists a distribution of dense sketching matrices with m = ε-2d log3(n/δ) such that solving the sketched regression problem gives the ℓ∞ guarantee, with probability at least 1 – δ. Moreover, the matrix S A can be computed in time O(nd log n). Our row count is nearly-optimal up to logarithmic factors, and significantly improves the result in (Price et al., 2017), in which a superlinear in d rows, m = Ω(ε-2d1+γ) for γ ∈ (0, 1) is required. Moreover, we develop a novel analytical framework for ℓ∞ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property introduced in (Song \& Yu, 2021). Our analysis is much simpler and more general than that of (Price et al., 2017). Leveraging this framework, we extend the ℓ∞ guarantee regression result to dense sketching matrices for computing the fast tensor product of vectors.
more » « less
Full Text Available
Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance

Song, Zhao; Yang, Xin; Yang, Yuanyuan; Zhang, Lichen (July 2023, Proceedings of the 40th International Conference on Machine Learning, PMLR)

Projection maintenance is one of the core data structure tasks. Efficient data structures for projection maintenance have led to recent breakthroughs in many convex programming algorithms. In this work, we further extend this framework to the Kronecker product structure. Given a constraint matrix A and a positive semi-definite matrix W∈R^{n×n} with a sparse eigenbasis, we consider the task of maintaining the projection in the form of B^⊤(BB^⊤)^{−1} B, where B=A(W⊗I) or B=A(W^{1/2}⊗W^{1/2}). At each iteration, the weight matrix W receives a low rank change and we receive a new vector h. The goal is to maintain the projection matrix and answer the query B^⊤(BB^⊤)^{−1} Bh with good approximation guarantees. We design a fast dynamic data structure for this task and it is robust against an adaptive adversary. Following the beautiful and pioneering work of [Beimel, Kaplan, Mansour, Nissim, Saranurak and Stemmer, STOC’22], we use tools from differential privacy to reduce the randomness required by the data structure and further improve the running time.
more » « less
Full Text Available
Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance

Song, Zhao; Yang, Xin; Yang, Yuanyuan; Zhang, Lichen (July 2023, Proceedings of Machine Learning Research)

Full Text Available

« Prev Next »

Search for: All records